Skip to content

HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3 #14624

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

deepsek
Copy link

@deepsek deepsek commented Jul 10, 2025

  • Added Matrix cores support (MFMA instructions) for MMQ kernels.

  • Enable stream-K for CDNA3 to work with MMQ kernels.

  • Removed usage of WARP_SIZE hardcoded constant in MMQ kernels.

  • NOTE: Thoughts on removing all uses of hardcoded const specific to only NVIDIA (like WARP_SIZE) in order to support other GPUs?

@JohannesGaessler @ggerganov
P.S. I am part of an AMD team actively working on enabling AMD feature set on llama.cpp. We would like to get on call to discuss some future PR plans for additional backends, flash attention changes, etc.

EDIT:
Update to add some performance charts for DeepSeekV3 model.

Upstream vs ROCm Fork Development
image

MI300X vs H100 Throughput Test
image

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jul 10, 2025
@JohannesGaessler
Copy link
Collaborator

I would be happy to get on a call with you to discuss AMD hardware support, my email address can be found on my Github page.

@ggerganov
Copy link
Member

P.S. I am part of an AMD team actively working on enabling AMD feature set on llama.cpp. We would like to get on call to discuss some future PR plans for additional backends, flash attention changes, etc.

@deepsek Thanks for the contribution and for reaching out. On topics related to the CUDA backend, @JohannesGaessler is the best person to consult with. For additional backends, @slaren can provide guidelines and advice. I'll be happy to provide input on any matters as well.

I am also available for call - feel free to contact me.

@Dampfinchen
Copy link

Dampfinchen commented Jul 11, 2025

Very nice to see the initiative. I assume improvements made for CDNA will also swap into the consumer side next year when UDNA releases. So this is exciting news for the future of AMD products!

@IMbackK
Copy link
Collaborator

IMbackK commented Jul 12, 2025

This certainly is good news

@JohannesGaessler
Copy link
Collaborator

Sorry, I wanted to ask: @IMbackK since you've been working on AMD support, are you interested in joining the discussion?

@IMbackK
Copy link
Collaborator

IMbackK commented Jul 14, 2025

Sorry, I wanted to ask: @IMbackK since you've been working on AMD support, are you interested in joining the discussion?

Yes, certainly. It would help to avoid duplication of effort. i can be reached via email at uvos.xyz user carl

@deepsek deepsek requested a review from ngxson as a code owner July 15, 2025 16:53
@github-actions github-actions bot added the devops improvements to build systems and github actions label Jul 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants